Statistical mechanics of systems with heterogeneous agents: Minority Games 
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We study analytically a simple game theoretical model of heterogeneous interacting agents. We 
show that the stationary state of the system is described by the ground state of a disordered spin 
model which is exactly solvable within the simple replica symmetric ansatz. Such a stationary state 
differs from the Nash equilibrium where each agent maximizes her own utility. The latter turns out 
to be characterized by a replica symmetry broken structure. Numerical results fully agree with our 
analytic findings. 
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Statistical mechanics of disordered systems provides 
analytical and numerical tools for the description of com- 
plex systems, which have found applications in many in- 
terdisciplinary areas JjJ. When the precise realization of 
the interactions in an heterogeneous system is expected 
not to be crucial for the overall macroscopic behavior, 
then the system itself can be modeled as having ran- 
dom interactions drawn from an appropriate distribu- 
tion. Such an approach appears to be very promising 
also for the study of systems with many heterogeneous 
agents, such as markets, which have recently attracted 
much interest in the statistical physics community [^],[|. 
Indeed it provides a workable alternative to the so called 
representative agent approach of micro-economic theory, 
where assuming that agents are identical, one is lead to 
a theory with one single (representative) agent Q . 

In this Letter we present analytical results for a simple 
model of heterogeneous interacting agents, the so called 
minority game (MG) H|), which is a toy model of N 
agents interacting through a global quantity representing 
a market mechanism. Agents aim at anticipating market 
movements by following a simple adaptive dynamics in- 
spired at Arthur's inductive reasoning [^J. This is based 
on simple speculative strategies that take advantage of the 
available public information concerning the recent mar- 
ket history, which can take the form of one of P patterns. 
Numerical studies HffpU have shown that the model dis- 
plays a remarkably rich behavior. The relevant control 
parameter HQ turns out to be the ratio a — P/N be- 
tween the "complexity" of information P and the number 
N of agents, and the model undergoes a phase transition 
with symmetry breaking Q independently of the origin 
of information Q . 

We shall limit the discussion on the interpretation of 
the model - which is discussed at some length in refs. [^|J^] 
- to a minimum and rather focus on its mathematical 
structure and to the analysis of its statistical properties 
for N ^> 1. Our main aim is indeed to show that the 
model can be analyzed within the framework of statistical 
mechanics of disordered system M . 



We find that dynamical steady states can be mapped 
onto the ground state properties of a model very similar 
to that proposed in ref. |nj in the context of optimal 
dynamics for attractor neural networks. There [|To[ one 
shows that the minimization of the interference noise is 
equivalent to maximizing the dynamical stability of each 
device composing the system. Conversely, we show that 
the individual utility maximization in interacting agents 
systems is equivalent to the minimization of a global func- 
tion. We also find that different learning models lead to 
different patterns of replica symmetry breaking. 

The model is defined as follows (|): Agents live in a 
world which can be in one of P states. These are labelled 
by an integer fj, — 1, . . . , P which encodes all the informa- 
tion available to agents. For the moment being, we follow 
ref. H and assume that this information concerns some 
external system so that [i is drawn from a uniform distri- 
bution = 1/P in {1, . . . , P}. Each agent i = 1, . . . , N 
can choose between one of two strategies, labeled by a 
spin variable Si £ {±1}, which prescribes an action a(f. i 
for each state fi. Strategies may be "look up tables", 
behavioral rules WlM or information processing devices. 



The actions 



are drawn from a bimodal distribution 



P( a si — il) = 1/2 for all i, s and [i and they will play 
the role of quenched disorder Hence there are only 
two possible actions - such as "do something" (a^ ' i = 1 
or "do the opposite" (a^ = — 1). It is convenient 
to make the dependence on s explicit in <r",, introduc- 



ing and £f so that a^ s i — gj? + s£f |tl| . If agent i 
chooses strategy Sj and her opponents choose strategies 
s_j = {sj,j 7^ i}, in state fj,, she receives a payoff 



<( Si)S _ i ) = -a^ ii G(^), 



where, defining = ^ . i 
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The function G(x), which describes the market mecha- 
nism, is such that x G{x) > for all x so that the total 
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payoff to agents is always negative: the majority of agents 
receives a negative payoff whereas only the minority of 
them gain. Note that the agent-agent interaction, which 
comes from the aggregate quantity G(A^), is of mean- 
field character. 

The game defined by the payoffs in Eq. (Q) can be ana- 
lyzed along the lines of game theory |Q by looking for its 
Nash equilibria in the strategies space {sj,j — 1, . . . , N}. 
Before doing this, we prefer to discuss the dynamics of 
inductive agents following refs. |^J^,^]: There, the game 
is repeated many times and agents try to estimate em- 
pirically which of the two strategies they have is the best 
one, using past observations. More precisely, each agent 
i assigns a score U s .j(t) to her s th strategy at time t, and 
we assume, as in ref. [ [L3| , that she chooses that strategy 
with probability (Til 



n s<i (t) = Prob{ Sl (t) = s} = Ce ru ^ 



(3) 



with C" 1 = J2s> e rc/s '<* (t) and T > 0. The scores are 
initially set to U s s(0) — 0, and they are updated as 



U Sti (t + 1) = U Sti (t) - aZ®G(A»®)/P. 



(4) 



The idea is that if a strategy s has predicted the right 
sign i.e. if a, s = — signG(A M ), its score, and hence its 
probability of being used, increases. Note that a^ i G(A tl ) 
in Eq. (||) is not the payoff (s, S-i) which agent i would 
have received if she had actually played strategy s ^ 
Si(t). Indeed G(A fl ) depends on the strategy Si(t) that 
agent i has actually played through A^. Agents in the 
MG neglect this effect and behave as if they were facing 
an external process G(A M ) rather than playing against 
other TV — 1 agents. This may seem reasonable ior N ^> 1 
since the relative dependence of aggregate quantities on 
each agent's choice is expected to be small. We shall see 
below (see Eq. [ll]) that this is not true: if agents consider 
the impact of their actions on A^ , the collective behavior 
changes considerably. 

We focus on the linear case G(x) — x, which allows 
for a simple treatment. Other choices, such as the orig- 
inal one G(x) = signx, lead to similar conclusions, as it 
will be discussed elsewhere ]lq| . With this choice, the 
total losses of agents is —J2i u i = (A^) 2 . The time av- 
erage a 2 of (A^) 2 is shown in Fig. 1, as a function of 
a = P/N. The system shows a complex behavior char- 
acterized, among other things, by a phase transition at 
a c — 0.34 Q where a 2 shows a cusp and a small a phase 
where a 2 increases with T |IJ| . 

In order to uncover this behavior, let us focus on the 
long time behavior of the dynamics. The key observation 
is that, in the long run, the score of a strategy depends 
on its performance in all P states. Hence, the behavior 
of agents will change systematically only on time-scales 
of order P. This suggests to introduce the rescaled time 
t = t/P. AsP^oo, any finite interval dr = At/P is 
made of infinitely many time steps and we can use the 



law of large numbers to approximate time averages with 
statistical averages over the variables (i(t) and Si(t) from 
their respective distributions and n s .i- We henceforth 
use the notation o = £> M o M for averages over /x and 
(•) for averages on Si(t) and we define mi(r) = (si(t)}. 
With this notations, a 2 reads: 



{A 2 ) = n 2 + 



C V'<"'j (5) 



where we have used statistical independence of Si, i.e. 



(SiSj) 



iiiij + (1 — mf )5i_j. The evolution of scores 



U s .i in continuum time t, is obtained iterating Eq. (|4j) 
for At = Pdr time steps. Using Eq. (|^) in the form 
■mi — tanh[r(C/ + i.i — U—± t i)], we find 



**=-2T(l 
dr 



(0) 



This can be easily written as a gradient descent dynamics 
^jjp* = — T(l — m 2 )j^- which minimizes the Hamiltonian 



H = {AY 



(J) 



As a function of m^, H is a positive definite quadratic 
form, which has a unique minimum. This implies that 
the stationary state of the MG is described by the ground 
state properties of H . It is easy to see [fL5[ that H is 



closely related to the order parameter 9 = y (sign A) 2 
introduced in Q , which is a measure of the system pre- 
dictability Indeed H oc 9 2 when 9 is small, suggesting 
that inductive agents actually minimizes predictability 
rather than their collective losses a 2 . 

It is possible to study the ground state properties of H 
in Eq. (Q) using the replica method Q. First we intro- 
duce an inverse temperature (3 Jlj| and compute the av- 
erage over the disorder variables H = {a^ ^} of the parti- 
tion function of n replicas of the system, (Z n )s- Next we 

perform an analytic continuation for non-integer values 

( z n ) 1 

of n, thus obtaining (\nZ}~ = lim, w0 - — „ — ■ The 'free 
energy' F ID = —{lnZ) s /(3 depends on the the overlap 
matrix Q a ^ = (a,b = 1, ...n, a ^ b) and on the 

order parameter Q a = j^^.iimf) 2 , together with their 
Lagrange multipliers r a ^ and R a respectively. Fid can 
be calculated using a saddle point method that, within 
the replica symmetric (RS) ansatz Q a ,b = q, r a> b — r (for 
all a < b), and Q a — Q, R a — R (for all a), leads to 



Fin — — 



1 



2a + f3(Q-q) 



i log 



i 



P{Q - q) 



a 



(RQ - rq) 



d$(C)log / dse-^^ 



where U(ir|C) = (3(r — R)\ — y/r(x and <E> is the normal 
distribution. The ground state properties of H are ob- 
tained solving the saddle point equations M in the limit 
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— > oo. Fig. 1 compares the analytic and numerical 
findings for a 2 . For a > a c = 0.33740. . ., the solution 
leads to Q = q < 1 and a ground state energy H > 0. 
Hq — > as a — > a+ and iJo = for a < a c . 

This confirms the conclusion (A^) = V/4 (or = 0) 
for a < a c and it implies the relation 



- 2 = 



N 



-(1-Q), a<a c 



(8) 



The RS solution is stable against replica symmetry 
breaking (RSB) for any a, as expected from positive def- 
initeness of H. Following ref. |10|], we compute the prob- 
ability distribution of the strategies, which for a > a c is 
bimodal and it assumes the particularly simple form 



2tt 



with z = y / a/(l + Q) (Q taking its saddle point value) 
and where cf>(z) = (1 — Erf(z/\/2))/2 is the fraction of 
frozen agents (those who always play one and the same 
strategy). Below a c , Vim) is continuous, i.e. = in 
agreement with numerical findings 

At the transition the spin susceptibility x = 
lim^oo 0(Q — q) diverges as a — ► a+ and it remains 
infinite for all a < a c . This is because the ground state 
is degenerate in many directions (zero modes) and an 
infinitesimal perturbation can cause a finite shift in the 
equilibrium values of m,. This implies that in the long 
run, the dynamics (^J) leads to an equilibrium state which 
depends on the initial conditions U Sl i(t — 0). The under- 
constrained nature of the system is also responsible for 
the occurrence of anti-persistent effects for a < a c @. 
The periodic motion in the subspace H = is proba- 
bly induced by inertial terms d 2 U S} i/dr 2 which we have 
neglected, and which require a more careful study of dy- 
namical solutions of Eqs. (||^). It is however clear that 
the amplitude of the excursion of U+n(t) — t/_i^(t) de- 
creases with T, by the smoothing effect of Eq. (||). When 
this amplitude becomes of the same order of 1/r anti- 
persistence is destroyed, which explains the sudden drop 
of a 2 with r found in ref. |T^ |. 

A natural question arises: is this state individually op- 
timal, i.e. is it a Nash equilibrium of the game where 
agents maximize the expected utility ul = —a s ^A? One 
way to find the Nash equilibria is to consider station- 
ary solutions of the multi-population replicator dynamics 
jl7j . This takes the form of an equation for the so called 
mixed strategies, i.e. for the probabilities tt s ^ with which 
agent i plays strategy s. In terms of = 7r + ,; — tt—j, 
with a little algebra, these equations ITt]] read 



dnii 
~dV 



(1 



dui 
) — - 

dm; 



(10) 



Observing that = —§^-, we can rewrite Eq. ( JTc| ) 
as a gradient descent dynamics which minimizes a global 



function which is exactly the total loss a 2 of agents. Nash 
equilibria then correspond to the local minima of a 2 in 
the domain [—1,1]^. The quadratic form a 2 is not posi- 
tive definite, which means that there shall be many local 
minima and the Nash equilibrium is not unique. It is 
easy to see Q that Nash equilibria are in pure strate- 
gies, i.e. rrq — 1 Vi, which implies a 2 = H, by Eq. 
([?]). A detailed characterization of the Nash equilibria 
shall be given elsewhere |H| . The best Nash equilibrium 
can be studied applying the replica method to a 2 for 
/3 — ► oo. The multiplicity of Nash equilibria (meta-stable 
states) manifests itself in the occurrence of replica sym- 
metry breaking for any a > with a non- vanishing a 2 /N 
|]l5| . The simple RS solution, though incorrect, provides 



a close lower bound F, 



(RS) 



V(m) = (j>(z)[6(m - 1) + 5(m + I)} + ^e~( zm) /2 (9) 



Fi D + \(l-Q) to a 2 /Nfor 



and F^>(0 



oo (see Fig. 1). For a > 1/-7T, we have Q = q = 1 
(- RS )/'« — go) = [i _ l/y / 7ra] 2 positive, whereas 

1 = Q < q and = for a < 1/tt. 

Fig. 1 shows that in a Nash equilibrium agents perform 
way better than in the MG. This is the consequence of 
the fact that agents do not take into account their impact 
on the market (i.e. on A 11 ) when they update the scores 
of their strategies by Eq. (Q). It is indeed known |lg| l 
that reinforcement-learning dynamics based on Eq. (0) 
is closely related to the replicator dynamics and hence 
it converges to rational expectation outcomes, i.e. to 
Nash equilibria. More precisely, ref. jl8| suggests that 
this occurs if Eq. (0) is replaced with 



U i>s (t+1) = U t , s {t) + 



ut {t \s,s^{t))lP 



(11) 



Now U St i(t) is proportional to the cumulated payoff that 
agent i would have received had she always played strat- 
egy s (with other agents playing what they actually 
played) until time t. As Fig. 1 again shows this leads to 
results which coincide with those of the Nash equilibrium. 
It is remarkable that the (relative) difference between 
Eqs. (0) and (O) is small, i.e. of order I/A* — l/VN. 
Yet, it is not negligible because, when averaged over all 
states (i it produces a finite effect, specially for a < a c 
and it affects considerably the nature of the stationary 
state. This term has the same origin of the cavity re- 
action term in spin glasses JlJ. In order to follow Eq. 
(|IT| ) agents need to know the payoff they would have re- 
ceived for any strategy s they could have played. That 
may not be realistic in complex situations where agents 
know only the payoffs they receive and are unable to dis- 
entangle their contribution from G(A^). However agents 
can account approximately for their impact on the mar- 
ket by adding a cavity term +r]5 S Si ( t ) to Eq. (|J) which 
"rewards" the strategy Si(t) used with respect to those 
s 7^ Si(t) not used. The most striking effect of this new 
term, as discussed elsewhere |L5| in detail, is that for 
a < a c an infinitesimal rj > is sufficient to cause RSB 
and to reduce a 2 /N by a finite amount. 
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So far, the information was randomly and inde- 
pendently drawn at each time t form the distribution 
gH — l/p. In the original version of the MG || ^ is 
instead endogenously determined by the collective dy- 
namics of agents: [i(t) indeed labels the sequence of the 
last M — log 2 P "minority" signs - i.e. /j,(t + 1) = 
[2/i(t) + l] mod p if A»® > and n(t + 1) = [2fi(t)] mod p 
otherwise. The idea J3j is that the information refers 
to the recent past history of the market, and agents try 
to guess trends and patterns in the time evolution of the 
process G(A^). We may say that fj,(t) is endogenous in- 
formation, since it refers to the market itself, as opposed 
to the exogenous information case discussed above. 

Numerical simulations |J show that the collective be- 
havior of the MG - based on Eq. (||) - under endoge- 
nous information is the same as that under exogenous 
information. Within our approach, the relevant feature 
of the dynamics of /x(t) is its stationary state distribu- 
tion qP. The key point is that a finite fraction 1 — <fr of 
agents behave stochastically (m| < 1) because Q < 1. 
As a consequence, has stochastic fluctuations of or- 
der J N(\ — Q) which are of the same order of its average 
(A^) ~ \fH. With endogenous information, these fluctu- 
ations of A M induce a dynamics of \x(t) which is ergodic in 
the sense that typically each /i is visited with a frequency 
rsj l/p in the stationary state [p^| . The situation 
changes completely when agents follow Eq. Indeed 
the system converges to a Nash equilibrium where agents 
play in a deterministic way, i.e. m\ = 1 (or Q = <fi = 1). 
The noise due to the stochastic choice of Si by Eq. (||) 
is totally suppressed. The system becomes deterministic 
and the dynamics of /i(t) locks into some periodic or- 
bit. The ergodicity assumption then breaks down: Only 
a small number P <C P of patterns /x are visited in the 
stationary state of the system, whereas the others never 
occur (g 1 - 1 = 0). This leads to an effective reduction of the 
parameter a — * a — P/N, which further diminishes a 2 . 
Numerical simulations show that P oc yP which imply 
that a in the limit P = aN -> oc, i.e a 2 /N -> 0. 

In summary we have shown how methods of statis- 
tical physics of disordered systems can successfully be 
applied to study models of interacting heterogeneous 
agents. Our results extend easily to more general models 
|fl5f and, more importantly, the key ideas can be applied 
to more realistic models of financial markets, where het- 
erogeneities arise e.g. from asymmetric information. 
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FIG. 1. a 2 /N versus a = P/N for P = 2 6 for inductive 
dynamics (full squares), for the numerical minimization of 
Eq. ((?]) (open squares), corrected inductive dynamics (full 
diamonds) and the ground state of a 2 (open diamonds) . The 
full and the dashed lines are the corresponding analytic re- 
sults. Averages are taken over 200 realizations. 
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